The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Recent advances on text-to-image generation have witnessed the rise of diffusion models which act as powerful generative models. Nevertheless, it is not trivial to exploit such latent variable models to capture the dependency among discrete words and meanwhile pursue complex visual-language alignment in image captioning. In this paper, we break the deeply rooted conventions in learning Transformer-based encoder-decoder, and propose a new diffusion model based paradigm tailored for image captioning, namely Semantic-Conditional Diffusion Networks (SCD-Net). Technically, for each input image, we first search the semantically relevant sentences via cross-modal retrieval model to convey the comprehensive semantic information. The rich semantics are further regarded as semantic prior to trigger the learning of Diffusion Transformer, which produces the output sentence in a diffusion process. In SCD-Net, multiple Diffusion Transformer structures are stacked to progressively strengthen the output sentence with better visional-language alignment and linguistical coherence in a cascaded manner. Furthermore, to stabilize the diffusion process, a new self-critical sequence training strategy is designed to guide the learning of SCD-Net with the knowledge of a standard autoregressive Transformer model. Extensive experiments on COCO dataset demonstrate the promising potential of using diffusion models in the challenging image captioning task. Source code is available at \url{https://github.com/YehLi/xmodaler/tree/master/configs/image_caption/scdnet}.
translated by 谷歌翻译
Most multimodal multi-objective evolutionary algorithms (MMEAs) aim to find all global Pareto optimal sets (PSs) for a multimodal multi-objective optimization problem (MMOP). However, in real-world problems, decision makers (DMs) may be also interested in local PSs. Also, searching for both global and local PSs is more general in view of dealing with MMOPs, which can be seen as a generalized MMOP. In addition, the state-of-the-art MMEAs exhibit poor convergence on high-dimension MMOPs. To address the above two issues, in this study, a novel coevolutionary framework termed CoMMEA for multimodal multi-objective optimization is proposed to better obtain both global and local PSs, and simultaneously, to improve the convergence performance in dealing with high-dimension MMOPs. Specifically, the CoMMEA introduces two archives to the search process, and coevolves them simultaneously through effective knowledge transfer. The convergence archive assists the CoMMEA to quickly approaching the Pareto optimal front (PF). The knowledge of the converged solutions is then transferred to the diversity archive which utilizes the local convergence indicator and the $\epsilon$-dominance-based method to obtain global and local PSs effectively. Experimental results show that CoMMEA is competitive compared to seven state-of-the-art MMEAs on fifty-four complex MMOPs.
translated by 谷歌翻译
As an effective method to deliver external materials into biological cells, microinjection has been widely applied in the biomedical field. However, the cognition of cell mechanical property is still inadequate, which greatly limits the efficiency and success rate of injection. Thus, a new rate-dependent mechanical model based on membrane theory is proposed for the first time. In this model, an analytical equilibrium equation between the injection force and cell deformation is established by considering the speed effect of microinjection. Different from the traditional membrane-theory-based model, the elastic coefficient of the constitutive material in the proposed model is modified as a function of the injection velocity and acceleration, effectively simulating the influence of speeds on the mechanical responses and providing a more generalized and practical model. Using this model, other mechanical responses at different speeds can be also accurately predicted, including the distribution of membrane tension and stress and the deformed shape. To verify the validity of the model, numerical simulations and experiments are carried out. The results show that the proposed model can match the real mechanical responses well at different injection speeds.
translated by 谷歌翻译
异常检测任务在AI安全中起着至关重要的作用。处理这项任务存在巨大的挑战。观察结果表明,深度神经网络分类器通常倾向于以高信心将分布(OOD)输入分为分配类别。现有的工作试图通过在培训期间向分类器暴露于分类器时明确对分类器施加不确定性来解决问题。在本文中,我们提出了一种替代概率范式,该范式实际上对OOD检测任务既有用,又可行。特别是,我们在培训过程中施加了近距离和离群数据之间的统计独立性,以确保inlier数据在培训期间向深度估计器显示有关OOD数据的信息很少。具体而言,我们通过Hilbert-Schmidt独立标准(HSIC)估算了Inlier和离群数据之间的统计依赖性,并在培训期间对此类度量进行了惩罚。我们还将方法与推理期间的新型统计测试相关联,加上我们的原则动机。经验结果表明,我们的方法对各种基准测试的OOD检测是有效且可靠的。与SOTA模型相比,我们的方法在FPR95,AUROC和AUPR指标方面取得了重大改进。代码可用:\ url {https://github.com/jylins/hone}。
translated by 谷歌翻译
相干显微镜技术提供了跨科学和技术领域的材料的无与伦比的多尺度视图,从结构材料到量子设备,从综合电路到生物细胞。在构造更明亮的来源和高速探测器的驱动下,连贯的X射线显微镜方法(如Ptychography)有望彻底改变纳米级材料的特征。但是,相关的数据和计算需求显着增加意味着,常规方法不再足以从高速相干成像实验实时恢复样品图像。在这里,我们演示了一个工作流程,该工作流利用边缘的人工智能和高性能计算,以实现直接从检测器直接从检测器流出的X射线ptychography数据实时反演。拟议的AI支持的工作流程消除了传统的Ptychography施加的采样约束,从而使用比传统方法所需的数据较少的数据级允许低剂量成像。
translated by 谷歌翻译
生成对抗网络(GAN)的适应旨在将预训练的GAN转移到具有有限培训数据的给定领域。在本文中,我们专注于单次案例,这在以前的作品中更具挑战性,很少探索。我们认为,从源域到目标域的适应性可以分为两个部分:全球样式(如纹理和颜色)的转移,以及不属于源域的新实体的出现。虽然先前的作品主要关注样式转移,但我们提出了一个新颖而简洁的框架\ footNote {\ url {https://github.com/thevoidname/generalized-onerized-one-one-shot-gan-adaption}},以解决\ textit {对样式和实体传输的一般性单发适应性}任务,其中提供了参考图像及其二进制实体掩码。我们的核心目标是通过切成薄片的瓦斯坦距离来限制参考文献和合成的内部分布之间的差距。为了更好地实现这一目标,首先使用样式固定来大致获得模范样式,并将辅助网络引入原始生成器以删除实体和样式传输。此外,为了实现跨域的对应关系,我们提出了变异的拉普拉斯正则化以限制适应性发生器的平滑度。定量和定性实验都证明了我们方法在各种情况下的有效性。
translated by 谷歌翻译
多尺度学习框架已被视为一种能够提高语义分割的能力类别。然而,这个问题并不是微不足道的,尤其是对于现实世界的部署,通常需要高效率推理潜伏期。在本文中,我们彻底分析了卷积块的设计(卷积的类型和卷积中的频道数量),以及跨多个尺度的相互作用方式,所有这些都是从轻量级的语义分割的角度来看。通过这样的深入比较,我们综述了三个原则,因此设计了轻巧且逐渐估计的网络(LPS-NET),这些网络以贪婪的方式在新颖地扩展了网络复杂性。从技术上讲,LPS-NET首先利用了建立小型网络的原则。然后,LPS-NET通过扩展单个维度(卷积块的数量,通道数量或输入分辨率)来逐步扩展到较大网络,以实现最佳的速度/准确性交易。在三个数据集上进行的广泛实验始终证明了LPS-NET优于几种有效的语义分割方法。更值得注意的是,我们的LPS-NET在CityScapes测试套装上达到73.4%MIOU,NVIDIA GTX 1080TI的速度为413.5fps,导致绩效提高1.5%,对抗最高的速度为65% - ART STDC。代码可在\ url {https://github.com/yihengzhang-cv/lps-net}中获得。
translated by 谷歌翻译
多尺度视觉变压器(VIT)已成为计算机视觉任务的强大骨干,而变压器量表中的自发计算则四处w.r.r.t.输入补丁编号。因此,现有的解决方案通常采用下采样操作(例如,平均合并)对密钥/值进行大幅降低计算成本。在这项工作中,我们认为,这种过度侵略性的下采样设计并不是可逆的,不可避免地会导致信息删除,尤其是对于物体中的高频组件(例如,纹理细节)。在小波理论的驱动下,我们构建了一种新的小波视觉变压器(\ textbf {Wave-vit}),该变压器以统一的方式通过小波变换和自我发挥学习来制定可逆的下采样。该提案可以通过对钥匙/价值观进行无损的下采样,从而实现自我发挥的学习,从而促进了追求更好的效率-VS-VS-Crifacy权衡。此外,逆小波变换被利用以通过扩大的接收场来汇总局部环境来增强自我注意力输出。我们通过广泛的实验比多个视觉任务(例如,图像识别,对象检测和实例分割)来验证波动的优势。它的性能超过了具有可比的拖鞋的最先进的VIT骨干。源代码可在\ url {https://github.com/yehli/imagenetmodel}中获得。
translated by 谷歌翻译
先前的工作提出了几种策略,以降低自我发挥机制的计算成本。这些作品中的许多作品都考虑将自我关注程序分解为区域和局部特征提取程序,这些程序都会产生较小的计算复杂性。但是,区域信息通常仅以损失的不良信息为代价,原因是由于下采样而丢失。在本文中,我们提出了一种新颖的变压器体系结构,旨在减轻成本问题,称为双视觉变压器(双击)。新的体系结构结合了一个关键的语义途径,可以更有效地将代币向量压缩到具有降低的复杂性顺序的全球语义中。然后,这种压缩的全局语义是通过另一个构造的像素途径在学习更精细的像素级详细信息中作为有用的先前信息。然后将语义途径和像素途径集成在一起并进行联合训练,从而通过这两个途径并行传播增强的自我运动信息。此后,双攻击能够降低计算复杂性,而不会损害很大的准确性。我们从经验上证明,双重射击比SOTA变压器体系结构具有较高的训练复杂性。源代码可在\ url {https://github.com/yehli/imagenetmodel}中获得。
translated by 谷歌翻译